Modified eQTL and Somatic DNA Segment Alterations in Esophageal Squamous Cell Carcinoma for Genes Related to Immunity, DNA Repair, and Inflammation

Simple Summary We applied an integrated approach to analyze expression, genotyping and somatic DNA alterations to find genetic markers (genes and SNPs) related to esophageal squamous cell carcinoma (ESCC). We extended the expression-quantitative trait loci (eQTL) analysis by using tumor vs. normal fold change data. By analyzing both RNA and DNA data from multiple platforms and focusing on the genes in three pathways: inflammation, DNA repair, and immunity, we have found results more relevant to ESCC. Abstract We integrated ESCC expression and GWAS genotyping, to investigate eQTL and somatic DNA segment alterations, including somatic copy number alteration, allelic imbalance (AI), and loss of heterozygosity (LOH) in ESCC. First, in eQTL analysis, we used a classical approach based on genotype data from GWAS and expression signals in normal tissue samples, and then used a modified approach based on fold change in the tumor vs. normal samples. We focused on the genes in three pathways: inflammation, DNA repair, and immunity. Among the significant (p < 0.05) SNP-probe pairs from classical and modified eQTL analyses, 24 genes were shared by the two approaches, including 18 genes that showed the same numbers of SNPs and probes and 6 genes that had the different numbers of SNPs and probes. For these 18 genes, we found 28 SNP–probe pairs were correlated in opposite directions in the two approaches, indicating an intriguing difference between the classical and modified eQTL approaches. Second, we analyzed the somatic DNA segment alterations. Across the 24 genes, abnormal gene expression on mRNA arrays was seen in 19–95% of cases and 26–78% showed somatic DNA segment alterations on Affymetrix GeneChip Human Mapping Arrays. The results suggested that this strategy could identify gene expression and somatic DNA segment alterations for biological markers (genes) by combining classical and modified eQTLs and somatic DNA evaluation on SNP arrays. Thus, this study approach may allow us to understand functionality indicative of potentially relevant biomarkers in ESCC.


Introduction
Esophageal cancer is the sixth most common cause of cancer death in the world [1]. An estimated 572,034 new esophageal cancer cases and 508,585 deaths occurred in 2018 worldwide, and the new cases and deaths both increased 4% each year between 2012 and 2018 [2,3]. There are two main histologic types of esophageal cancer-esophageal squamous

Materials and Methods
Briefly, cases diagnosed with ESCC between 1994 to 2007 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, and considered candidates for curative surgical resection were identified and recruited to participate in this study after obtaining informed consent. None of the cases had prior therapy, and Shanxi was the ancestral home for all patients.
To better understand the relations among SNPs/genes identified from cancer GWAS and eQTL and somatic DNA segment alterations in tumors, we integrated three types of data: GWAS results from germline DNA [14], genome-wide array results of mRNA expression from tumor and matched normal tissue [22], and genome-wide SNP array results from DNA of tumor and germ line. For this analysis, we focused our evaluation on genes in three pathways: inflammation, DNA repair, and immunity. First, we extracted data from ESCC patients and their matched neighborhood controls from our GWAS study on ESCC [14]. By using the KEGG, Biocarta, and GO databases, we identified 1805 genes in inflammation pathways, 1125 genes in DNA repair pathways, and 735 genes in immunity pathways (Table S6). We selected all loci in these genes, plus 20 Kb around each gene. We also mapped all selected loci on genes present on the Affymetrix Human Genome U133 arrays. Second, we performed eQTL analyses separately by using signals for normal tissue only (called classical) and tumor vs. normal fold change (called modified) in 100 ESCC cases from our previous studies [22] (with GEO accession number GSE44021 for these mRNA array data) for the genes on the three pathways (see Appendix A). Third, we compared significant eQTLs from classical and modified strategies to find loci/genes shared by both approaches. Finally, we examined somatic DNA segment alterations (somatic copy number alterations, LOH, allelic imbalance) in shared genes by classical and modified eQTL analyses in ESCC cases (n = 76) by using GeneChip Human Mapping Arrays (GEO accession number for the 500K and SNP 5.0 arrays is GSE74705, and for the SNP 6.0 array, GSE128695). Details of the methods used are described in Appendix A.

Patient Information
Initially, genome-wide genotyping was performed on a large group of ESCC cases and controls from Shanxi as part of a larger upper gastrointestinal (UGI) cancer GWAS [14]. We have U133 mRNA expression data for 100 ESCC cases, which are a subset of the GWAS samples. Availability of both gene expression and genotype data allowed us to perform eQTL analyses. Characteristics of the 100 ESCC patients evaluated here are summarized in Table S1 as follows: Cases ranged in age from 39 to 71 years (median 58 years) and were predominantly male (60%). Around 26% cases had a positive UGI cancer family history. Clinically, nearly all cases had Stage II tumors (96%) while 46% had evidence of lymph nodes metastasis at diagnosis. Patient survival times ranged from 1.1 months to 67.7 months (median 23.3 months).

Pathway-Based Analyses
From the ESCC GWAS study, we had genotyping data on 550K SNPs for 1423 ESCC cases and 1660 controls. For each SNP, we estimated the odds ratio (OR) and its 95% confidence interval (CI) by using a generalized linear model with adjustment for age and gender. We searched genes in the pathways related to inflammation, immunity, and DNA repair from pathway databases KEGG, Biocarta, and GO. Among the 550K SNPs, 31K were in these genes (or within the 20 Kb 5 upstream to 20 Kb 3 downstream window around them). A total of 1587 SNPs showed a nominally significant association with ESCC (p < 0.05 and 95% CI did not include OR = 1). Restricting SNPs to genes also on the Affymetrix U133 array reduced the number of significant SNPs from 1587 to 1233 related to 864 genes in the three pathways ( Figure 1).    Although none of the correlations between SNPs and gene expression was significant after Bonferroni correction (0.05/1233 = 4.06 × 10 −5 , there were 131 nominally significant correlations between genotypes of 104 SNPs and expressions of 70 genes with 93 probes in classical eQTL analysis ((a) in Table S2). These eQTLs included 57 positive and 74 negative correlations. Among the positive correlations, the gene-SNP pair CD46 (probe 208783_s_at) and rs7144 had a rho = 0.31 with the smallest p = 0.0017; among the negative correlations, the pair CASP8 (probe 213373_s_at) and rs10931936 had a rho = −0.365 and the smallest p = 0.0002. Among these eQTLs are several genes with interesting functions, for example ERCC3 and PARD3. DNA repair gene ERCC3 (excision repair cross-complementing), with its product specifically corrected the defect in an early step of the DNA nucleotide excision repair (NER) pathway in UV-sensitive rodent mutants of complementation group 3. ERCC3 (probe 202176_at) was positively correlated with the SNP rs1143407 (rho = 0.26, p = 0.009), and the tumor suppressor gene PARD3 (partitioning-defective protein 3), (probe 221280_s_at) was negatively correlated with the SNP rs2496720 (rho = 0.26, p = 0.009) ( Figure S1a).
There were three patterns among the 131 correlations. With regard to a single gene, we observed correlation(s) between (1) one probe and one SNP; (2) multiple probes and one or two SNPs (i.e., expression of three probes of KLK2 correlated with one SNP); and (3) one probe and multiple SNPs (i.e., CD226, one probe was significantly correlated with five SNPs) ((a) in Table S2).  Table S2). Some of these 56 genes had multiple eQTLs. For example, CDKN2A had two probes (209644_s_at and 207039_at) positively correlated with the SNP rs3731239 (rho = 0.311 and 0.283, respectively), and one probe (211156_at) positively correlated with SNP rs2811708 (rho = 0.230). Furthermore, two probes of BCL2L11 (208536_s_at and 222343_at) were negatively correlated with the SNP rs724710 (rho = 0.274 and −0.246, respectively).

Shared Significant eQTLs in the Classical and Modified Approaches
Among the significant classical (131 eQTLs) and modified (114 eQTLs) eQTLs analyses were 28 eQTL pairs with the same probes and SNPs shared by the two analyses but with effects in opposite directions ((a) in Table 1). These eQTLs included 18 pairs with negative rhos in the classical but positive rhos in the modified approach, and 10 pairs with positive rhos in the classical but negative rhos in the modified eQTL analysis. For example, DAPK1 with probe 211214_s_at and SNP rs1964911 had rho= -0.294 in classical eQTL but rho = 0.218 in modified (Figure 2a,b). Similarly, ST6GAL1 (beta-galactosamide alpha-2,6-sialyltranferase 1) on 3q27 with probe 214971_s_at and SNP rs12495026 also showed opposite directions with rho = −0.242 in classical eQTL ( Figure 3a) and 0.22 in modified eQTL (Figure 3b). The same phenomenon was also observed in a gene having one SNP with two probes. For example, SNP (rs2496720) in PARD3 was associated with two probes (221280_s_at and 210094_s_at) whose rho values in classical eQTL (−0.262 and −0.215, respectively) were in the opposite direction as in modified eQTL (0.247 and 0.23, respectively) ( Figure S1a (221280_s_at and 210094_s_at) whose rho values in classical eQTL (−0.262 and −0.215, respectively) were in the opposite direction as in modified eQTL (0.247 and 0.23, respectively) ( Figure S1a,b are plots for 2 of the 4 correlations). There were 18 genes involved in the 28 pairs of eQTLs, including six tumor suppressor genes (DAPK1, PTPRM, GLS2, RARB, TCF7L2, ZBTB16).

Somatic DNA Segment Alterations (Copy Number (CN) Alterations, Allele Imbalance (AI) and LOH) in Genes Shared by the Significant eQTLs from the Two Approaches
We performed advanced investigation of somatic DNA segment alterations at the gene level by using DNAs from tumor and germ line samples. We focused on the genes with significant classical (70 genes, (a) in Table S2) and modified (56 genes, (b) in Table S2) eQTLs. Twenty-four genes were found to be shared by these two groups, including 18 genes already found in the 28 eQTL pairs with the same probes and SNPs ((a) in Table 1) and six genes with different probes and SNPs ((b) in Table 1).    These 24 genes are closely related to ESCC in two aspects. First, 19 of them (Table S5) were differentially expressed genes (DEGs) comparing tumor vs. normal with FDR < 0.05. Second, the survival analysis showed that 7 genes had significant (p < 0.05) Kaplan-Meier curves, including 5 genes (MS4A1, N4BP2L1, IGF1R, TCF7L2 and COL11A1) based on the normal expression, and 2 genes (TCF7L1 and PTPRM) based on the tumor expression. Based on the tumor/normal fold change data, the gene IGF1R was found to be significant with a Kaplan-Meier p-value of 0.009. The significant Kaplan-Meier plots for three genes-TCF7L1, TCF7L2 and IGF1R, based on normal expression, tumor expression, and tumor to normal fold change-were shown in Figure S2. Table S3 shows detailed information (pathway involved, and biologic function) on the 24 genes. Fifteen of the 24 genes involved only one of the three pathways under study (inflammation, immunity, and DNA repair), and nine involved two of the three pathways. Table 2 shows gene expression status and somatic DNA segment alterations of the 24 genes. Among 76 ESCC cases, 19-95% cases had abnormal (over or under) expression across the 24 genes. In addition, 26-78% cases carried DNA segment alterations in the 24 genes. For example, 33% of cases showed overexpression and 68% cases had copy number gains in ST6GAL1. Some genes with mixed CN gain and loss were observed. For example, MS4A1 showed more cases with CN gain than loss. ITPR1 showed more cases with CN loss than gain, and DAPK1 showed CN gain and loss plus mixed LOH. We noted that two genes (CD46, ST6GAL1) showed CN gain only (Figure 4a is an example for CN gain on ST6GAL1), whereas three genes (CYP2C18, CYP2C9, RARB) showed more CN loss (Figure 4b is an example for CN loss on RARB) (Table S4). Furthermore, we noticed that for the 24 genes, the AI ranged from 4% (ST6GAL1 in 3/76) to 43% (PDCD1LG2 in 33/76) of the cases with AI only ( Table 2). Table 2. Cross-sample characteristics of gene expression based on mRNA array (n = 100) and somatic alterations based on SNP array (n = 76) for the 24 genes in ESCC.

No
Gene Cytoband We also checked somatic DNA segment alterations in the (32 of 56) genes identified by modified eQTL, which were not shared with classical eQTL, and DNA segment alterations were observed with a range from 26% (ATF6 in 20/76) to 82% (CDKN2A in 62/76) of cases (Table S4). Overall, the results indicated that genes/SNPs selected from classical and modified eQTLs also carried DNA somatic alterations.

Discussion
By focusing on genes in the three pathways-inflammation, DNA repair, and immunity-and integrating RNA and DNA data, we have obtained results more relevant to ESCC. Classical eQTLs may provide a crucial link between the variants from GWAS research and the biological processes they affect. However, because most of these SNPs are located on non-coding regions of genes, and although the identification of variants that affect phenotypes is rapidly progressing, the current fundamental challenge is to understand how these variants exert their effects. Thus, we applied an approach by combining classical and modified eQTLs to find shared SNP-probes, which were on genes involved in the three pathways (inflammation, DNA repair, immunity). Paired tumor/normal data from the same individuals were used to obtain the modified eQTLs to control for interindividual genetic differences. We also investigated somatic DNA

Discussion
By focusing on genes in the three pathways-inflammation, DNA repair, and immunity-and integrating RNA and DNA data, we have obtained results more relevant to ESCC. Classical eQTLs may provide a crucial link between the variants from GWAS research and the biological processes they affect. However, because most of these SNPs are located on non-coding regions of genes, and although the identification of variants that affect phenotypes is rapidly progressing, the current fundamental challenge is to understand how these variants exert their effects. Thus, we applied an approach by combining classical and modified eQTLs to find shared SNP-probes, which were on genes involved in the three pathways (inflammation, DNA repair, immunity). Paired tumor/normal data from the same individuals were used to obtain the modified eQTLs to control for interindividual genetic differences. We also investigated somatic DNA segment alterations for the genes shared by the two kinds of eQTLs from a subset (n = 76) of these ESCC cases by using DNAs from tumor and germline samples using an Affymetrix SNP array. For example, CN variations defined as DNA segments that are 1 kb or larger in size present at variable copy number in comparison with a reference genome and have attracted much attention. It is generally accepted that a somatic CN alteration is highly associated with the development and progression of numerous cancers through its impact on gene expression levels positively or negatively [23][24][25]. Furthermore, AI can arise from the complete loss of an allele or an increase in copy number of one allele relative to the other.
Among the significant classical (131 eQTLs) and modified (114 eQTLs) eQTLs analyses, there were 28 eQTL pairs shared by the two eQTL analyses, but interestingly the 28 pairs of correlations were in opposite directions. The two analyses differed in that the signal of normal mRNA was used in classical eQTL to evaluate associations between SNP-probe(s) in normal tissue whereas the modified eQTL used fold change to connect SNP and probe/gene expressions in tumor. Thus, the opposite direction of the associations observed suggested that the SNP-probe pairs identified in classical eQTL also could potentially influence gene expression in tumor directly/indirectly, suggesting that these SNP-probe pairs may play a role related to gene expression during the development of ESCC. Among the 24 genes ((a), (b) in Table 1) shared by the two eQTL analyses, 79% of them are DEGs and 7 genes had significant (p < 0.05) Kaplan-Meier curves. Figure S2 showed the three significant Kaplan-Meier plots for three genes: TCF7L1, TCF7L2, and IGF1R. The patients with higher expression in these three genes had better survival rates. This was consistent with the gene TCF7L2 being one of the tumor suppressor genes. We further examined somatic DNA segment alterations of the 24 shared genes and found that the frequencies of somatic DNA segment alterations, including CN alterations, AI and LOH, in these genes ranged from 26% to 78%. The results indicated that the SNP-probes of the 24 genes in the three pathways may be related to ESCC, although we cannot identify CN gain status for each individual SNP. For example, ST6GAL1 showed somatic CN gain in 68% and gene overexpression in 33% of 76 ESCC examined ( Table 2). ST6GAL1 is a member of the family of sialyltransferases, which catalyze the transfer of sialic acids to terminal positions of carbohydrate groups of glycoproteins and glycolipids. ST6GAL1 expression levels have been shown to be upregulated in numerous types of cancers, including pancreatic, prostate, breast, and ovarian cancers, and has been correlated with high tumor grade, metastasis, and reduced patient prognosis in several studies [26]. Aberrant glycosylation is a universal feature of cancer cells, and there is now overwhelming evidence that glycans can modulate pathways intrinsic to tumor cell biology. To date, ST6GAL1 CN gain in ESCC has not been reported in the literature. Thus, the results by using our approach may contribute insight into somatic DNA segment alterations of genes identified based on eQTL studies related to ESCC, in additional to mutations of driver genes.
Our work has several limitations. First, each probe on the mRNA array covered only a small coding region of the gene so that the results do not reflect the full gene expression status (like RNA sequencing). Second, the study had a relatively small sample size resulting in small case numbers in each of the three genotypes. Finally, it was not possible to examine somatic mutations of the 24 genes identified.
In summary, by combining classical and modified eQTL data on genes in three pathways (inflammation, DNA repair, and immunity), we identified 24 genes shared by the two eQTL analyses. The results suggested that by connecting results from classical and modified eQTLs and somatic DNA alterations, it may be possible to better understand functionality indicative of the potentially relevant biomarkers in ESCC by the integrated approach.

Conclusions
By combining classical and modified eQTL data on genes in three pathways (inflammation, DNA repair, and immunity), we identified 24 genes shared by the two eQTL analyses. We found that majority of the 24 genes are differentially expressed genes comparing tumor to normal and some of them are useful for ESCC patient prognosis prediction. The results suggested that by connecting results from classical and modified eQTLs and somatic DNA alterations, it may be possible to better understand functionality indicative of the potentially relevant biomarkers in ESCC by the integrated approach.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14071629/s1, Figure S1: (a) Distribution of expression in normal with a negative rho for the gene-SNP pair PARD3 (221280_s_at) and rs2496720; (b) Distribution of tumor vs. normal fold change with a positive rho for the same gene-SNP pair, Figure S2: Significant Kaplan-Meier plots for three genes: TCF7L1, TCF7L2 and IGF1R based on (a) TCF7L2 expression in normal; (b)TCF7L1 expression in tumor; (c) IGF1R expression in normal; (d) IGF1R tumor/normal fold change; Table S1: Clinical characteristics and risk factors in100 ESCC cases, Table S2: (a) eQTL identified significant 70 genes with 93 probes and 104 SNPs in Normal only at p < 0.05. (b) eQTL identified significant 56 genes with 79 probes and 93 SNPs in Tumor vs Normal group at p < 0.05, Table S3: Information for shared 24 genes by classical and modified eQTLs, Table S4: Somatic DNA segment alterations for 56 genes identified in modified eQTL using SNP array, Table S5: Among the 24 genes in Table 1, we found 19 of them were differentially expressed genes with FDR < 0.05, Table S6: The lists of genes in the inflammation, DNA repair, and immunity pathways.  Gene expression of paired tumor/normal samples was examined by using Affymetrix Human Genome U133 arrays for 100 ESCC cases [22]. Of these 100 paired samples, 37 pairs used the Human U133A chips, 57 pairs used the U133A version 2.0 chips, and 6 pairs used the U133 Plus2 chips from Affymetrix. Probes were prepared according to the protocol provided by the manufacturer (Affymetrix GeneChip expression analysis technical manual), available from: http://www.affymetrix.com/support/index.affx, accessed on 6 March 2022).
For the Affymetrix U133 array data, raw datasets (CEL files on all samples) after scanning were normalized by using Robust Multi-array Average (RMA) [27] implemented in Bioconductor in R (http://www.bioconductor.org, accessed on 6 March 2022), including background correction and normalization across all samples. For each sample, log2 fold changes in gene expression were calculated by subtracting the adjacent normal RMA value from the corresponding tumor RMA value.